hyperparameter value
Schrödinger bridge problem via empirical risk minimization
Belomestny, Denis, Naumov, Alexey, Puchkin, Nikita, Suchkov, Denis
We study the Schrödinger bridge problem when the endpoint distributions are available only through samples. Classical computational approaches estimate Schrödinger potentials via Sinkhorn iterations on empirical measures and then construct a time-inhomogeneous drift by differentiating a kernel-smoothed dual solution. In contrast, we propose a learning-theoretic route: we rewrite the Schrödinger system in terms of a single positive transformed potential that satisfies a nonlinear fixed-point equation and estimate this potential by empirical risk minimization over a function class. We establish uniform concentration of the empirical risk around its population counterpart under sub-Gaussian assumptions on the reference kernel and terminal density. We plug the learned potential into a stochastic control representation of the bridge to generate samples. We illustrate performance of the suggested approach with numerical experiments.
Machine Unlearning under Overparameterization
Block, Jacob L., Mokhtari, Aryan, Shakkottai, Sanjay
Machine unlearning algorithms aim to remove the influence of specific training samples, ideally recovering the model that would have resulted from training on the remaining data alone. We study unlearning in the overparameterized setting, where many models interpolate the data, and defining the solution as any loss minimizer over the retained set$\unicode{x2013}$as in prior work in the underparameterized setting$\unicode{x2013}$is inadequate, since the original model may already interpolate the retained data and satisfy this condition. In this regime, loss gradients vanish, rendering prior methods based on gradient perturbations ineffective, motivating both new unlearning definitions and algorithms. For this setting, we define the unlearning solution as the minimum-complexity interpolator over the retained data and propose a new algorithmic framework that only requires access to model gradients on the retained set at the original solution. We minimize a regularized objective over perturbations constrained to be orthogonal to these model gradients, a first-order relaxation of the interpolation condition. For different model classes, we provide exact and approximate unlearning guarantees and demonstrate that an implementation of our framework outperforms existing baselines across various unlearning experiments.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.04)
- Information Technology > Security & Privacy (0.92)
- Law (0.67)
reviewers; we will make sure to update our manuscript accordingly. 2 1 Comparison with Other Unsupervised Methods (R1)
We would like to thank the reviewers for their comments and suggestions. In particular, TimeNet is a seq2seq method relying on an antoencoding loss and using LSTMs as encoder and decoder. TimeNet, and notably do not scale to long time series (as explained on lines 144-157), unlike ours. However, we did perform experiments on some datasets with different loss variants. We will add insights on this matter to the paper.
Dear Reviewers R1, R2, and R3: Thank you for your comments and suggestions to improve our paper
Dear Reviewers R1, R2, and R3: Thank you for your comments and suggestions to improve our paper. In contrast, BRN's statistical estimates are based on batches After tuning its hyperparameters, we observed that it performs worse (Figure 2). ON removes the batch size parameter and introduces two decay rate parameters. We will include this figure in the paper's appendix. Note, this is not the best value observed in the sweep.